91 research outputs found
Fusing Automatically Extracted Annotations for the Semantic Web
This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination.
Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories
How much semantic data on small devices?
Semantic tools such as triple stores, reasoners and query en- gines tend to be designed for large-scale applications. However, with the rise of sensor networks, smart-phones and smart-appliances, new scenar- ios appear where small devices with restricted resources have to handle limited amounts of data. It is therefore important to assess how ex- isting semantic tools behave on such small devices, and how much data they can reasonably handle. There exist benchmarks for comparing triple stores and query engines, but these benchmarks are targeting large-scale applications and would not be applicable in the considered scenarios. In this paper, we describe a set of small to medium scale benchmarks explicitly targeting applications on small devices. We describe the re- sult of applying these benchmarks on three different tools (Jena, Sesame and Mulgara) on the smallest existing netbook (the Asus EEE PC 700), showing how they can be used to test and compare semantic tools in resource-limited environments
Recommended from our members
Building SPARQL-Enabled Applications with Android devices
In this paper, we show how features can be added to an Android device (a smartphone) to enable mobile applications to expose data through a SPARQL endpoint. Using simple query federation mechanisms, we describe a demonstrator illustrating how SPARQL-Enabled Android devices can allow us to rapidly develop applications mashing-up data from a collaborative network of sensor-based data sources
Recommended from our members
Identifying relevant sources for data linking using a semantic web index
With more data repositories constantly being published on the Web, choosing appropriate data sources to interlink with newly published datasets becomes a non-trivial problem. While catalogs of data repositories and meta-level descriptors such as VoiD provide valuable information to take these decisions, more detailed information about the instances included into repositories is often required to assess the relevance of datasets and the part of the dataset to link to. However, retrieving and processing such information for a potentially large number of datasets is practically unfeasible. In this paper, we examine how using an existing semantic web index can help identifying candidate datasets for linking. We further apply ontology schema matching techniques to rank these candidate datasets and extract the sub-dataset to use for linking, in the form of classes with instances more likely to match the ones of the local dataset
Capturing emerging relations between schema ontologies on the Web of Data
Semantic heterogeneity caused by the use of different ontologies to describe the same topics represents an obstacle for many data integration tasks on the Web of Data, in particular, discovering relevant repositories for interlinking and comparing repositories with respect to the coverage of specific domains. To facilitate these tasks, mappings between schema terms are needed alongside the links between instances. Currently, explicitly specified schema-level mappings are scarce in comparison with instance-level links. However, by analysing existing instance-level links it is possible to capture correspondences between classes to which these instances belong. In our experiments, we applied this approach on a large scale to generate schema-level mappings between several Linked Data repositories. The results of these experiments provide some interesting insights about the use of ontologies on the Web of Data and schema-level relations which emerge from existing data-level interlinks
Recommended from our members
Unsupervised learning of link discovery configuration
Discovering links between overlapping datasets on the Web is generally realised through the use of fuzzy similarity measures. Configuring such measures is often a non-trivial task that depends on the domain, ontological schemas, and formatting conventions in data. Existing solutions either rely on the user's knowledge of the data and the domain or on the use of machine learning to discover these parameters based on training data. In this paper, we present a novel approach to tackle the issue of data linking which relies on the unsupervised discovery of the required similarity parameters. Instead of using labeled data, the method takes into account several desired properties which the distribution of output similarity values should satisfy. The method includes these features into a fitness criterion used in a genetic algorithm to establish similarity parameters that maximise the quality of the resulting linkset according to the considered properties. We show in experiments using benchmarks as well as real-world datasets that such an unsupervised method can reach the same levels of performance as manually engineered methods, and how the different parameters of the genetic algorithm and the fitness criterion affect the results for different datasets
Data linking: capturing and utilising implicit schema-level relations
Schema-level heterogeneity represents an obstacle for automated discovery of coreference resolution links between individuals. Although there is a multitude of existing schema matching solutions, the Linked Data environment differs from the standard scenario assumed by these tools. In particular, large volumes of data are available, and repositories are connected into a graph by instance-level mappings. In this paper we describe how these features can be utilised to produce schema-level mappings which facilitate the instance coreference resolution process. Initial experiments applying this approach to public datasets have produced encouraging results
Position paper on realizing smart products: challenges for Semantic Web technologies
In the rapidly developing space of novel technologies that combine sensing and semantic technologies, research on smart products has the potential of establishing a research field in itself. In this paper, we synthesize existing work in this area in order to define and characterize smart products. We then reflect on a set of challenges that semantic technologies are likely to face in this domain. Finally, in order to initiate discussion in the workshop, we sketch an initial comparison of smart products and semantic sensor networks from the perspective of knowledge
technologies
Recommended from our members
The SemSearchXplorer - exploring semantic search results with semantic visualizations
SemSearchXplorer is a toolkit for the exploration of semantic data. The goal is to lower user barriers to access information in semantic data repositories. Therefore SemSearchXplorer supports the user in three respects: (1) it supports querying of the semantic data with a keyword based approach, so the users do not need to learn a semantic query language, (2) it helps users find relevant results both by using semantic enriched information about the results and semantic filter options to narrow down the set of results, and (3) it provides information exploration capabilities through semantic visualizations recommended by the system. Filtering of semantic search results helps to narrow down the result set to a more manageable amount of information. Besides searching for relevant information, facilities for the exploration of the results help users to gain insight in the context of results. With several semantic visualizations, we try to help users making sense of the raw data. Based on the assumption that there is no single visualization that fits all exploration needs, SemSearchXplorer recommends visualizations based on the selected information of users
- …